Probabilistic Data Analysis with Probabilistic Programming
نویسندگان
چکیده
Probabilistic techniques are central to data analysis, but different approaches can be difficult to apply, combine, and compare. This paper introduces composable generative population models (CGPMs), a computational abstraction that extends directed graphical models and can be used to describe and compose a broad class of probabilistic data analysis techniques. Examples include hierarchical Bayesian models, multivariate kernel methods, discriminative machine learning, clustering algorithms, dimensionality reduction, and arbitrary probabilistic programs. We also demonstrate the integration of CGPMs into BayesDB, a probabilistic programming platform that can express data analysis tasks using a modeling language and a structured query language. The practical value is illustrated in two ways. First, CGPMs are used in an analysis that identifies satellite data records which probably violate Kepler’s Third Law, by composing causal probabilistic programs with non-parametric Bayes in under 50 lines of probabilistic code. Second, for several representative data analysis tasks, we report on lines of code and accuracy measurements of various CGPMs, plus comparisons with standard baseline solutions from Python and MATLAB libraries.
منابع مشابه
Support vector regression with random output variable and probabilistic constraints
Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector Machine (SVM). In this paper, a new model of SVR with probabilistic constraints is proposed that any of output data and bias are considered the random variables with uniform probability functions. Using the new proposed method, the optimal hyperplane regression can be obtained by solving a quadrati...
متن کاملMulti-item inventory model with probabilistic demand function under permissible delay in payment and fuzzy-stochastic budget constraint: A signomial geometric programming method
This study proposes a new multi-item inventory model with hybrid cost parameters under a fuzzy-stochastic constraint and permissible delay in payment. The price and marketing expenditure dependent stochastic demand and the demand dependent the unit production cost are considered. Shortages are allowed and partially backordered. The main objective of this paper is to determine selling price, mar...
متن کاملUsing Probabilistic-Risky Programming Models in Identifying Optimized Pattern of Cultivation under Risk Conditions (Case Study: Shoshtar Region)
Using Telser and Kataoka models of probabilistic-risky mathematical programming, the present research is to determine the optimized pattern of cultivating the agricultural products of Shoshtar region under risky conditions. In order to consider the risk in the mentioned models, time period of agricultural years 1996-1997 till 2004-2005 was taken into account. Results from Telser and Kataoka mod...
متن کاملA Facility Location Problem with Tchebychev Distance in the Presence of a Probabilistic Line Barrier
This paper considers the Tchebychev distance for a facility location problem with a probabilistic line barrier in the plane. In particular, we develop a mixed-integer nonlinear programming (MINLP) model for this problem that minimizes the total Tchebychev distance between a new facility and the existing facilities. A numerical example is solved to show the validity of the developed model. Becau...
متن کاملProbabilistic View of Occurrence of Large Earthquakes in Iran
In this research seismicity parameters, repeat times and occurrence probability of large earthquakes are estimated for 35 seismic lineaments in Persian plateau and the surrounding area. 628 earthquakes of historical time and present century with MW>5.5 were used for further data analysis. A probabilistic model is used for forecasting future large earthquake occurrences in each chosen lineament....
متن کاملExtension of Cube Attack with Probabilistic Equations and its Application on Cryptanalysis of KATAN Cipher
Cube Attack is a successful case of Algebraic Attack. Cube Attack consists of two phases, linear equation extraction and solving the extracted equation system. Due to the high complexity of equation extraction phase in finding linear equations, we can extract nonlinear ones that could be approximated to linear equations with high probability. The probabilistic equations could be considered as l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1608.05347 شماره
صفحات -
تاریخ انتشار 2016